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Abstract 

The first morphological learner based 
upon the theory of Whole Word Mor- 
phology (Ford et al., 1997) is outlined, 
and preliminary evaluation results are pre- 
sented. The program, Whole Word Mor- 
phologizer, takes a POS-tagged lexicon 
as input, induces morphological relation- 
ships without attempting to discover or 
identify morphemes, and is then able to 
generate new words beyond the learning 
sample. The accuracy (precision) of the 
generated new words is as high as 80% us- 
ing the pure Whole Word theory, and 92% 
after a post-hoc adjustment is added to the 
routine. 

The aim of this project is to develop a computa- 
tional model employing the theory of whole word 
morphology (Ford et al., 1997) capable on the one 
hand of identifying morphological relations within a 
list of words from any one of a wide variety of lan- 
guages and, on the other, of putting that knowledge 
to use in creating previously unseen word forms. 
A small application called Whole Word Morpholo- 
gizer which does just this is outlined and discussed. 
In particular, this approach is set against the liter- 
ature on computational morphology as an entirely 
different way of doing things which has the potential 
to be generalized to all known varieties of morphol- 
ogy in the world's languages, a feature not shared by 
previous methods. As it is based on a model of the 
mental lexicon in which all entries are entire, fully 
fledged words, this project also serves as an empiri- 
cal demonstration that a word-based morphological 



theory that rejects the notion of morpheme as mini- 
mal unit of form and meaning (and/or grammatical 
properties) is viable from the point of view of acqui- 
sition as well as generation. 

1 Morphological learning 

Since its inception in the mid 1950s, the field of 
computational morphology has been characterized 
by a paucity of procedures for generation. Notwith- 
standing the impressive body of literature on the 
shortcomings of traditional Paninian morphology, 
most computational research projects also rely on a 
traditional notion of the morpheme and ignore all 
non-compositional aspects of morphology. These 
observations are obviously not unrelated and are in 
part inherited from the field of computational syntax 
where applications traditionally were designed to as- 
sign a syntactic structure to a given string of words, 
though this is less true today. 

1.1 Segmentation and morpheme identification 

Word formation and the population of the lexicon, 
while central to morphological theory, are notice- 
ably absent from the field of computational mor- 
phology. Most computational work in the field 
of morphology has focused on the identification of 
morphemes or morphological parsing while paying 
little or no attention to generation. While these ap- 
plications find a common goal in the automatic ac- 
quisition of morphology, it is helpful to distinguish 
between two types of analysis in light of the often 
very different results sought by various morphologi- 
cal learners. 

On the one hand, some applications focus ex- 
clusively on the segmentation of words or longer 
strings into smaller units. In other words, their 



function is to identify morpheme boundaries within 
words and, as such, they only indirectly identify 
morphemes as linguistic units. Zellig Harris's (Har- 
ris, 1955; Harris, 1967) pioneering work suggests 
that morpheme boundaries can be determined by 
counting the number of letters that follow a given 
substring within a corpus (v. (Hafer and Weiss, 
1974) for a further development of Harris's ideas). 
Janssen (1992) and Flenner (1994; 1995) also work 
towards segmenting words but use training corpora 
in which morpheme boundaries have been manually 
inserted. Recent work by Kazakov and Manand- 
har (1998) combines unsupervised and supervised 
learning techniques to generate a set of segmenta- 
tion rules that can further be applied to previously 
unseen words. 

On the other hand, some computational morpho- 
logical applications are designed solely to identify 
morphemes based on a training corpus and not to 
provide a morphological analysis for each word of 
that corpus. Brent (1993), for example, aims at find- 
ing the right set of suffixes from a corpus, but the 
algorithm cannot double as a morphological parser. 

More recently, efforts have been developing 
which identify morphemes and perform some sort of 
analysis. Schone and Jurafsky (2001) employ a great 
many sophisticated post-hoc adjustments to obtain 
the right conflation sets for words by pure corpus 
analysis without annotations. Their procedure uses 
a morpheme-based model, provides an analysis of 
the words, and does in a sense discover morphologi- 
cal relations. Goldsmith (2001b; 2001a), inspired by 
de Marcken's (1995) thesis on minimum description 
length, attempts to provide both a list of morphemes 
and an analysis of each word in a corpus. Also, Ba- 
roni (2000) aims at finding a set of prefixes from a 
corpus, together with an affix-stem parse of each of 
the words. 

While they might differ in their methods or ob- 
jectives, all of the above morphological applications 
share a common characteristic in that they are learn- 
ers designed exclusively for the acquisition of mor- 
phological facts from corpora and do not generate 
new words based on the information they acquire. 

1.2 Parsing and generation 

Only a handful of programs can both parse and gen- 
erate words. Once again, these programs fall into 



two very distinct categories. In view of the dispar- 
ity between these programs, it is useful to distin- 
guish between genuine morphological learners able 
to generate from acquired knowledge and genera- 
tors/parsers that implement a man-made analysis. 
The latter group is perhaps the most well known, so 
let us begin with them. 

Kimmo-type applications of two-level morphol- 
ogy (Koskenniemi, 1983; Antworth, 1990; Kart- 
tunen et al., 1992; Karttunen, 1993; Karttunen, 
1994) can provide a morphological analysis of the 
words in a corpus and generate new words based on 
a set of rules; but these programs must first be pro- 
vided with that set of rules and a lexicon contain- 
ing morphemes by the user. Similar work in one- 
and two-level morphology has been done using the 
Attribute-Logic Engine (Carpenter, 1992). Some of 
these systems (e.g. (Karttunen et al, 1987)) have 
a front-end that compiles more traditional linearly 
ordered morphological rules into the finite-state au- 
tomata of two-level morphology. Once again, these 
applications require a set of man-made lexical rules 
to function. While the practical uses of such applica- 
tions as PC-Kimmo are incontestable, it is clear that 
they are part of a different endeavour, and should not 
be confused with genuine morphological learners. 

The other relevant group of computational appli- 
cations can, as mentioned, both acquire morpho- 
logical knowledge from corpora and generate new 
words based on that knowledge. Albright and Hayes 
(2001a; 2001b) tackle the wider task of acquir- 
ing morphology and (morpho)phonology based on 
a small paradigm list and their learner is able to gen- 
erate particular inflected forms given a related word. 
Dzeroski and Erjavec (1997) work towards learning 
morphological rules for forming particular inflec- 
tional forms given a lemma (a set of related words). 
Their learner produces a set of rules relating all the 
members of a paradigm to a base form. The program 
can then produce a member of that paradigm on 
command given the base form. While the methods 
used by Albright and Hayes and Dzeroski and Er- 
javec radically differ, both use a form of supervised 
learning which significantly reduces the amount of 
information their learner has to acquire. Albright 
and Hayes train their program using a paradigm list 
in which each entry contains, for example, both the 
present and past tense forms of an English verb. 



Similarly, the training data used by Dzeroski and Er- 
javec similarly has a base form, or lexeme, associ- 
ated to each and every word so that all the words 
of a given paradigm share a common label. The 
distinctions between the two methods are immate- 
rial, what matters is that both learners are being told 
which words are related to which and are left with 
the task of describing that relation in the form a rule. 
In other words, the algorithms they use cannot dis- 
cover that words are morphologically related. 

1.3 What's morphology? 

In the above algorithms, the task of determining 
whether one word is related to another in a morpho- 
logical sense is most frequently left to the linguist, 
as this information has to be encoded in the train- 
ing data for these algorithms. (Some of the most 
recent work such as (Schone and Jurafsky, 2001) 
and (Goldsmith, 2001b) are notable exceptions to 
this paradigm.) This is perhaps not surprising, since 
no serious attempt at defining a morphological rela- 
tion has been made in the last few decades. Amer- 
ican structuralists of the forties and fifties proposed 
what have been referred to as discovery procedures 
(v. (Nida, 1949), for example) for the identification 
of morphemes but since the mid fifties (Chomsky, 
1955), it has been customary for morphological the- 
ory to ignore this aspect of morphology and relegate 
it to studies on language acquisition. But, since a 
morphological learner like that presented here is de- 
signed to model the acquisition of morphology, it 
seems that it should above all be able to determine 
for itself whether two words are morphologically re- 
lated or not, whether there is anything morphologi- 
cal to acquire at all. 

Another important thing to note about the vast 
majority of computational morphology learners is 
their reliance on a traditional notion of the mor- 
pheme as a lexical unit and their exclusive fo- 
cus on concatenative morphology. There is a 
panoply of recent publications devoted to the em- 
pirical shortcomings of traditional so-called "Item- 
and-Arrangement" morphology (Hockett, 1954; 
Bochner, 1993; Ford and Singh, 1991; Anderson, 
1992; Ford et al., 1997), and the list of phenomena 
that fall out of reach of a compositional approach 
is rather impressive: zero-morphs, ablaut-like pro- 
cesses, templatic morphology, class markers, partial 



suppletion, etc. Still, seemingly every documented 
morphological learner relies on a Bloomfieldian no- 
tion of the morpheme and produces an Item-and- 
Arrangement analysis; this description applies to all 
of the computational papers cited above. 

2 An alternative theory 

Whole Word Morphologizer (henceforth WWM) is 
the first implementation of the theory of Whole 
Word Morphology. The theory, developed by Alan 
Ford and Rajendra Singh at Universite de Montreal, 
seeks to account for morphological relations in a 
minimalist fashion. Ford and Singh published a se- 
ries of papers dealing with various aspects of the the- 
ory between 1983 and 1990. Drawing on these pa- 
pers, they published a full outline of it in 1991 (Ford 
and Singh, 1991) and an even fuller defense of it 
in 1997 (Ford et al, 1997). Since then, aspects of it 
have been taken up in a series of publications by Ag- 
nihotri, Dasgupta, Ford, Neuvel, Singh, and various 
combinations of these authors. The central mech- 
anism of the theory, the Word Formation Strategy 
(WFS), is a sort of non-decomposable morpholog- 
ical transformation that relates full words with full 
words (or helps one fashion a full word from an- 
other full word) and parses any complex word into 
a variable and a non-variable component. Neuvel 
and Singh (In press) offer a strict definition of mor- 
phological relatedness and, based on this definition, 
suggest guidelines for the acquisition of Word For- 
mation Strategies. 

In Whole-Word Morphology, any morphological 
relation can be represented by a rule of the following 
form: 

(1) \X\ a ^\X% 

in which the following conditions and notations are 
employed: 

1. \X\ a and |X'|p are statements that words of the 
form X and X' are possible in the language, 
and X and X' are abbreviations of the forms of 
classes of words belonging to categories a and 
[3 (with which specific words belonging to the 
right category can be unified in form); 

2. ' represents all the form-related differences be- 
tween X and X'\ 



3. a and [3 are categories that may be represented 
as feature-bundles; 

4. «-> represents a bi-directional implication; 

5. X' and X wee semantically related. 

There are several ramifications of (1). First, there 
is only one morphology; no distinction, other than 
a functional one, is made between inflection and 
derivation. Second, morphology is relational and not 
compositional. The program thus makes no refer- 
ence to theoretical constructs such as 'root', 'stem', 
and 'morpheme', or devices such as 'levels' and 
'strata' and relies exclusively on the notion of mor- 
phological relatedness. And since its objective is 
not to assign a probability to a given word or string, 
it must rely on a strict formal definition of a mor- 
phological relation. Ultimately, the theory takes the 
Saussurean view that words are defined by the differ- 
ences amongst them and argues that some of these 
differences, namely those that are found between 
two or more pairs of words, constitute the domain 
of morphology. In other words, two words of a lexi- 
con are morphologically related if and only if all the 
differences between them are found in at least one 
other pair of words of the same lexicon. 

3 Overview of the method 

Under the assumption that the morphology of a lan- 
guage resides exclusively in differences that are ex- 
ploited in more than one pair of words within its lex- 
icon, WWM (Algorithm 1 in the next section) com- 
pares every word of a small lexicon and determines 
the segmental differences found between them. The 
input to the current version of the program is a small 
text file that contains anywhere from 1000 to 5000 
words. Each word appears in orthographic form and 
is followed by its syntactic and morphological cate- 
gories, as in the example below: 

(2) cat, Ns (Noun, singular) 

catch, V 

catches, V3s (Verb, (pres.) 3rd pers. 
sing.) 

decided, Vp (Verb, past) 

The algorithm simply compares each letter from 
word A to the corresponding one from word B to 
produce a comparison record, which can be viewed 



as a data structure. Currently, it works on ortho- 
graphic representations. This means it would as eas- 
ily work on phonemic transcriptions, but it will re- 
quire empirical evaluation to see whether the results 
from these can improve upon those obtained using 
spellings, and we have not yet gone through such an 
exercise. It starts on either the left or right edge of 
the words if the two words share their first (few) seg- 
ments or their last (few) segments, respectively (the 
forward version is presented in Algorithm 2 in the 
next section). This is just a simple-minded way of 
aligning the similar parts of the words for the com- 
parison; a more sophisticated implementation in the 
future could use a more general sequence alignment 
procedure. The segments are placed in one of two 
lists in the comparison structure (differences or sim- 
ilarities) based on whether or not they are identical. 
Each comparison structure also contains the cate- 
gories of both words, and is kept in a large list of all 
comparison structures found from analyzing the en- 
tire corpus. The example below shows the informa- 
tion in the comparison structure produced from the 
English words receive and reception. It includes the 
differences and similarities between the two words, 
from the perspective of each word in turn, as well as 
the lexical categories of the words. 



(3) Differences 

First word Second word 

####ive v ####ption Ns 

Similarities 
First Second 

rece### rece##### 

Matching character sequences in the difference 
section are replaced with a variable. The re- 
sult is then set against comparisons generated by 
other pairs of words and duplicate differences are 
recognized. In the example below, the compar- 
isons produced by the pairs receive/reception, con- 
ceive/conception and deceive/deception are shown. 



(4) 



Differences 
First word Second word 



X ivey 
X ivey 
X ivey 



X ptionNs 
X ption Ns 
X ptionNs 



Similarities 
First Second 

rece### rece##### 
conce### conce##### 
dece### dece##### 



The three comparisons in (4) share the same, for- 
mal and grammatical differences, and so the theory 
indicates they should be merged into one morpho- 
logical strategy. Since the differences are the same, 
it is only the similarities that are actually merged. 
Each new morphological strategy is also restricted 
to apply in as narrow an environment as possible. 
Neuvel and Singh (Neuvel and Singh, In press) sug- 
gest that any morphological strategy must be maxi- 
mally restricted at all times; this is accomplished by 
specifying as constant all the similarities found, not 
between words, but between the similarities found 
between words. In (4), all three sets of similarities 
end with the sequence of letters "ce." These similar- 
ities between similarities are specified as constant in 
each strategy and the length of each word is also fac- 
tored in. The merge routine called in Algorithm 2 
carries out this procedure; we don't show it because 
it is tedious but not especially interesting. The re- 
stricted morphological strategy relating the words in 
(4) is as follows: 

(5) Differences 

First word Second word 

X ivey X ptionNs 

Similarities 
First Second 

^ ■//■// CC"// - //^ ■//■// CC"// - //^/ // ft 

For the sake of clarity, we can represent the infor- 
mation contained in (5) in a more familiar fashion 
using the formalism described in (1). The vertical 
brackets ' | ■ | ' are used for orthographic forms so as 
not to confuse them with phonemic representations. 



(6) |*##ceive|y <-> |*##ception 



Ns 



The '#' signs in the above representations stand 
for letters that must be instantiated but are not spec- 
ified; the '*' symbol stands for a letter that is not 
specified and that may or may not be instantiated. 
Strategy (6) can therefore be interpreted as follows: 

(6') If there is a verb that ends with the sequence 
"ceive" preceded by no less than two and 
no more than three characters, there should 
also be a singular noun that ends with the se- 
quence "ception" preceded by the same two 
or three characters. 

After performing the comparisons and merging, 
WWM extracts a list of morphological strategies, 
which are those comparison structures whose count 
is more than some fixed threshold. Table 1 con- 
tains a few strategies found from the first few 
chapters of Moby Dick. These strategies result 
from merging comparison structures which have the 
same differences — merging the similarities of sev- 
eral unifiable word pairs, and so many have no spec- 
ified letters at all. 

WWM then goes through the lexicon word by 
word and attempts to unify each word in form and 
category with the left or right side of this strategy. 
If it succeeds, WWM replaces all the segments fully 
specified on the side of the strategy the word is uni- 
fied with, with the segments fully specified on the 
other side. For example, given the noun perception 
in the corpus and strategy (6), WWM will map the 
word onto the right hand side of (6), take out the se- 
quence "ception" from the end and replace it with 
the sequence "ceive" to produce the new word per- 
ceive. The category of the word will also be changed 
from singular noun to verb. New words can thus be 
generated in a rather obvious fashion by taking each 
word in the original lexicon and applying any strate- 
gies that can be applied, i.e. whose orthographic 
form and part of speech can be unified with the word 
at hand. Algorithm 3 shows the basic generation 
procedure; once again the routines called unify 
and create which implement the nitty-gritty de- 
tails of the above description are not given because 
they are more tedious than interesting, and will cer- 
tainly need to be changed in more general future 
versions of WWM. Table 2 gives some of the new 
words WWM creates using text from he petit prince 
as its base lexicon. 



Table 1 : Word-formation strategies discovered from Moby Dick 



Differences 


Similarities 




1st word 


2nd word 


1st word 


2nd word 


Examples 


Xdpp 


X v 


sL-kLsL- II Irll HQTT 
"T* *r» "T- -T- / ( II II II CtT 


v!^ v!^ .J.. F 1 Irll f f 

ttt -r- m n ll ll c- 


baked/bake, charged/charge 


Xedpp 


X v 


*######## 


*###### 


directed/direct 


XSWr, 


Xns 


■¥ ^ 


^ -4- *fc •¥tTTTTt // 


helmets/helmet, rabbits/rabbit 


XingGER 


Xedpp 


******####### 


-fc -4-"//"//^/// 


walking/walked, talking/talked 


Xing GER 


XSv3s 


^ ^ ^ ^ ^tth mtHtm lr 

-F- -P- -T- -MT 7T7T7T7T7T7T 


H 4 H 4 -t" ^fc^l" ■//■/£//■// 


walking/walks, talking/talks 


XnessNs 


Xadj 


* * * *######### 


* * * * # * # *##### 


short/shortness 


XlyADV 


Xadj 


^ ^ ^■//■//^/i //■// 


-fc ^TTTrTr // 


easy/easily, quick/quickly 


XestADj 


Xadj 


II //■//■//■// 


*#### 


hardest/hard, shortest/short 


Xsv3s 


X v 


***##### 


***#### 


jumps/jump, plays/play 


Xer AD j 


Xadj 


*###### 


*#### 


harder/hard, louder/loud 


XlessADJ 


Xns 


^ tT f J It 1 1 llll II Tr 

•T' TT TTTT TTTT TTTT TT 


*#### 


painless/pain, childless/child 


XingGER 


Xy A Dj 


■%-4£~ll II ll~H^~H"H' 


("i" I"/ II //■// 


raining/rainy, running/runny 


Xedpp 


XSv3s 


**###### 


**##### 


played/plays 


Xings Np 


X v 






paintings/paint 



Table 2: Words generated from L<? /?<??z7 prince 



drames 


Np 


droitement 


ADV 


dressee 


PF 


droles 


AIP 


dresser 


INF 


drolement 


ADV 


dressa 


Vp3 


dunes 


Np 


dressais 


Vi2 


durerait 


Vc3 


dresse 


V3 


decidee 


PF 


dressent 


V6 


decider 


INF 


dressez 


V5 


decida 


Vp3 


dressait 


Vi3 


decide 


V3 


droits 


AMP 


decoiffe 


AM 


droites 


AFP 


deconcentres 


AMP 



The output from the algorithm is a list of words, 
much as in Table 2, which are generated from the in- 
put corpus using the morphological relations (strate- 
gies) discovered. The method described above will 
clearly force WWM to create words that were al- 
ready part of its original lexicon; in fact, each and 
every word involved in licensing the discovery of 
a morphological strategy will be duplicated by the 
program. Generated words that were not part of 
WWM's original lexicon are then added to a sepa- 

'By word we mean an orthographic form together with the 
part of speech. Further work in this vein would add meanings 
as well. 



rate word list containing only new words. If desired, 
this new word list can be merged with the original 
lexicon for another round of discovery to formu- 
late new strategies based on a larger dataset. Ad- 
ditionally, each of the new words can simply be put 
through another cycle of word creation by applying 
the same strategies as before a second time. 

4 Implementation 

This section contains some pseudocode showing 
several basic components of the Whole Word Mor- 
phologizer. Algorithm 1 shows the main procedure, 
which takes a POS-tagged lexicon as input and out- 
puts a list of all words that are possible given the 
morphological relations present in the lexicon. 

The two procedures compforward and comp- 
backward are symmetrical, so Algorithm 2 shows 
just the first of these. This algorithm provides the 
data structure which includes the differences and 
similarities between each pair of words in the lexi- 
con, in similar fashion to the examples in the preced- 
ing section. In practice, only those pairs of words 
which are by some heuristic sufficiently similar in 
the first place are compared. Additionally, the two 
similarities sequences for each word pair are actu- 
ally represented as one sequence which encodes the 
information found in the two sequences of the exam- 
ples in the preceding; this is just for convenience of 



Algorithm 1 WWM(lexicon) 
Require: lexicon to be a list of POS-tagged 
words. 

Ensure: a list newwords is generated 
for all tagged words vv ; - do 
for all tagged words wj do 

if Wi and Wj share a beginning sequence 
then 

compforward (w, , Wj ) 
else if Wi and wj share an ending sequence 
then 

compbackward(w,, Wj) 
end if 
end for 
end for 

for all comparison structures in the list do 
if count(comparison) > Threshold then 
append comparison to the list 
strategies 

generate(lexicon, strategies) 
end if 
end for 



storage and computation. 

Algorithm 3 shows the outline of the final stage, 
which generates an output list of words from the in- 
put lexicon and the morphological strategies. The 
strategy list is simply a list of all comparison struc- 
tures that occurred more frequently than some arbi- 
trary threshold number. 

5 Accomplishments and prospects 
5.1 Initial results 

Whole Word Morphologizer has been tested on a 
limited basis using English and French lexicons of 
approximately 3000 entries, garnered from the POS- 
tagged versions of Le petit prince and Moby Dick. 
The program initially, without any post-hoc correc- 
tions, achieved between 70% and 82% accuracy in 
generation; these figures measure the percentage of 
the new words beyond the original lexicon that are 
possible words of the language. The figures thus 
measure a kind of precision value, in terms of the 
precision/recall tradeoff, and are fair values in that 
they do not include the generated words that are al- 
ready in the lexicon. 



Algorithm 2 compforward (w 1,^2) 

Require: w\ and W2 to be (word, category) pairs. 
Ensure: a data structure comparison document- 
ing the different and similar letters between w\ 
and W2 is merged into the global list of com- 
parisons, comparison is a structure of 5 lists 
widif, wicat, W2dif, W2cat, sim. 
for x = 1 to length(w2) do 

if characters w\ (x) = W2(x) then 
append wi(x) to list sim 
if list widif does not end with 'X' then 

append 'X' to both lists widif and w^dif 
else 

append w\ (x) to widif, 
append W2 {x) to W2dif , append '#' to sim 
end if 
end if 
end for 

for x = length (W2) + 1 to length(wi) do 

append w\ (x) to widif 
end for 

if dif lists and categories match a comparison al- 
ready in the list comps then 

merge comparisons and increment 

count(comparison) 
else 

append comparison to comps 
count(comparison) <— 1 
end if 



A satisfactory recall metric seems impossible to 
think of in its usual sense here. First of all, there are 
generally an indefinite number of possible words in a 
language. One therefore cannot give a precise set of 
words that we wish the system could generate from 
a specific lexicon, so there seems to be no way to 
measure the percentage of "desired words" that are 
in fact generated. Even if we were to make such a 
list by hand from the current small corpora to use as 
a gold standard (which has been suggested by a ref- 
eree), it must also be remembered that WWM dis- 
covers strategies (morphological relations) for cre- 
ating new words from given ones. It cannot be ex- 
pected to discover strategies that are not evident in a 
corpus. Indeed, WWM will never discover that, for 
example, 'am' and 'be' are related, because accord- 
ing to the theory of morphology being applied these 



Algorithm 3 generate(lexicon, strategies) 
Ensure: a list newwords is generated using 
lexicon and strategies 
for all words in lexicon do 
for all strategies do 

if unify(lexicon[x], strategies[x]) 
says the word and strategy match with either 
left or right alignment then 

newword <— create(lexicon[x], 
strategies[x]) 

if newword is not in the lexicon or the list 
newwords then 

append newword to newwords list 
end if 
end if 
end for 
end for 



words are only related by convention, not by mor- 
phology. "Nonproductive morphology" is not really 
morphology. 

The real point is that we do not want to hold 
WWM's performance up against our own ideas 
about morphological relations among words, since 
it would be practically impossible to determine not 
merely a large set of possible words that linguists 
think are related to those in the corpus, but rather a 
set of possible words that WWM ought to generate 
according to its theory. This would amount to try- 
ing to beat WWM at its own game in pursuit of a 
gold standard, which could only be obtained using a 
better implementation of WWM's theory. A perfect 
implementation of Whole Word Morphology would 
have perfect recall, in view of our eventual goal of 
using this theory to inform us about the morphology 
of a language — about what ought to be recalled. We 
are not trying to learn something that we feel is al- 
ready known. 

5.2 What's learning? 

It is worth considering the endeavor of learning mor- 
phology in terms of formal learning theory, as pre- 
sented in Osherson et al. (1986) or Kanazawa (1998) 
for example. In the classical framework, the prob- 
lem of learning a language from positive exam- 
ple data is approached by considering the succes- 
sive guesses at the target language that a purported 



learner makes when presented with some sequen- 
tially increasing learning sample drawn from that 
language. Considering just morphology, it seems 
that the target language is the set of all possible 
words of the natural language at hand, a possibly 
infinite (or at least indefinite) set. WWM's output 
is a list of generated words subsuming the corpus, 
which are supposed to be all the words creatable by 
applying its idea of morphology to that corpus. It 
can thus be viewed as making a guess about the tar- 
get language, given a certain learning sample. If the 
learning sample is increased, its guess increases in 
size also. The errors in precision of course mean 
that at the current corpus sizes its guesses are for the 
moment not even subsets of the target language. 

According to one classic paradigm, a system 
would be held to be a successful learner if it could 
be proven to home in on the target language as the 
learning sample increased in size indefinitely. This 
is Gold's (1967) criterion of identification in the 
limit. In this framework, an empirical analysis can- 
not be used to decide the adequacy of a learner, and 
we would like to deemphasize the importance of the 
empirical results for this purpose. That said, the em- 
pirical results are for now all we have to show, but 
eventually we hope to produce a mathematical proof 
of just what WWM can learn, and just what kinds of 
lexicons are learnable in Gold's sense. 

To our knowledge, it has never been proven 
whether the total lexicon of a natural language is 
identifiable in the limit from the sort of data we pro- 
vide (i.e. POS-tagged words), using in particular the 
theory of Whole Word Morphology in a perfect fash- 
ion. Still, it is interesting that nothing about this lan- 
guage learning paradigm says anything about mor- 
phological analysis. The current crop of true mor- 
phological learners, e.g. (Goldsmith, 2001b), en- 
deavor to learn to analyze the morphology of the 
language at hand in the manner of a linguist. Gold- 
smith has even called his Linguistica system a "lin- 
guist in a box." This is perhaps an interesting and 
worthwhile endeavor, but it is not one that is un- 
dertaken here. WWM is instead attempting to learn 
the target language in a more direct way from the 
data, without first constructing the intermediary of 
a traditional morphological analysis. We are thus 
not learning the linguist's notion of morphology but 
rather the result of morphology, i.e. the word forms 



of the language together with the other information 
that goes into a word. 2 

5.3 Post-hoc fixes and future developments 

A significant proportion of errors in generation re- 
sult from the application of competing ambiguous 
morphological strategies. For example, when us- 
ing the (French) text of he petit prince as its base 
lexicon, WWM produces two strategies relating 2nd 
person verb forms to their infinitives. Given the verb 
conjugues 'conjugate,' pres. 2nd sing., one strategy 
produces the correct -er class infinitive conjuguer 
while the other creates the non-word *conjuguere, 
based on the relation among -re verb forms like 
fais/faire 'do' and vends/vendre 'sell.' This is be- 
cause of an inherent ambiguity among various word 
pairs which do not fully indicate the paradigms of 
which they are a part. WWM then adds to its lex- 
icon, not only the correct form, but all the outputs 
warranted by its grammar. 

To try to correct this problem, a form of lexical 
blocking has been implemented in the current ver- 
sion of the program. WWM creates every possible 
word, including different strategies giving the same 
one, and lets lexical lookup take precedence over 
productive morphology. The knowledge WWM pos- 
sesses about its lexicon increases considerably dur- 
ing the creation of morphological strategies. The 
program learns not only which strategies are li- 
censed by a given lexicon, but also which words 
of its lexicon are related to one another. WWM 
can assign a number to every lexical entry and give 
the same "paradigm" number to related words. Be- 
fore adding a newly created word to its lexicon, the 
program looks for an existing word with the same 
paradigm number and category. For example, if 
WWM maps the word decoction, which was as- 
signed to, say, paradigm 489 onto a strategy creating 
plural nouns, it will look for a plural noun belonging 
to paradigm 489 in its lexicon before it adds decoc- 
tions to the list of new words. 

Preliminary results are encouraging, with WWM 
reaching up to 92% accuracy in generation after 

2 In this theory, a word's form cannot be usefully divorced 
from the other information that allows its proper use, and in our 
implementation the POS tags (poor substitutes for what should 
be a richer database of information) are crucial to the discovery 
of the strategies. 



the blocking modification. Obviously the program 
needs to be systematically tested on multiple lexica 
from different languages, but these results strongly 
suggest that it is possible to model the acquisition 
of morphology as a component of learning to gen- 
erate language directly, rather than to treat computa- 
tional learning as the acquisition of linguistic theory 
as several current approaches do, e.g. (Goldsmith, 
2001b). 

Although the principles of whole word morphol- 
ogy allow one to contemplate versions of WWM that 
would work on templatic morphologies, polysyn- 
thetic languages, and a host of other recalcitrant phe- 
nomena, the current instantiation of the program is 
not so ambitious. The comparison algorithm de- 
tailed in the previous section compares words letter 
by letter, either from left to right or from right to 
left. No other possible alignments between words 
are considered and WWM is in its current state only 
capable of grasping prefixal and suffixal morphol- 
ogy. We are currently developing a more sophis- 
ticated sequence alignment routine which will al- 
low the program to handle infixing, circumfixing, 
and templatic morphologies of the Semitic type, as 
well as word-internal changes typified by Germanic 
strong verb ablaut. 
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